The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic used for comparing the similarity of two samples. It was developed by the botanist Thorvald Sørensen and published in 1948.[1]
It is often misspelled as Sorenson index, Soerenson index and Sörenson index (also with the correct ending -sen).
Sørensen's original formula was intended to be applied to presence/absence data, and is
where A and B are the number of species in samples A and B, respectively, and C is the number of species shared by the two samples; QS is the quotient of similarity and ranges from 0 - 1. This expression is easily extended to abundance instead of presence/absence of species. This quantitative version of the Sørensen index is also known as Czekanowski index. The Sørensen index is identical to Dice's coefficient[2] which is always in [0, 1] range. The Sørensen index used as a distance measure, 1 − QS, is identical to Hellinger distance and Bray Curtis dissimilarity[3] when applied to quantitative data.
The Sørensen coefficient is mainly useful for ecological community data (e.g. Looman & Campbell, 1960[4]). Justification for its use is primarily empirical rather than theoretical (although it can be justified theoretically as the intersection of two fuzzy sets[5]). As compared to Euclidean distance, Sørensen distance retains sensitivity in more heterogeneous data sets and gives less weight to outliers [6].